Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry RPM cloning on http/502 error (bad gateway) #7373

Closed
wants to merge 1 commit into from

Conversation

reubeno
Copy link
Member

@reubeno reubeno commented Jan 22, 2024

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./SPECS/LICENSES-AND-NOTICES/data/licenses.json, ./SPECS/LICENSES-AND-NOTICES/LICENSES-MAP.md, ./SPECS/LICENSES-AND-NOTICES/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • Ready to merge

Summary

We continue to intermittently see HTTP 502 errors (Bad Gateway) when tdnf download'ing packages from packages.microsoft.com, usually in build pipelines. This PR introduces granular retry logic for that specific failure case. (We can't retry on any HTTP error, as 404 is an expected error that shows up in normal execution.)

Change Log

Update the rpmrepcloner to inspect stderr on tdnf download failure. If the specific known error string indicating a 502 error is found, then we retry with exponential backoff.

Does this affect the toolchain?

No

Test Methodology

(Testing in progress)

@microsoft-github-policy-service microsoft-github-policy-service bot added the main PR Destined for main label Jan 22, 2024
Comment on lines +703 to +708
for _, line := range strings.Split(stderr, "\n") {
if strings.Contains(line, "Error: 502 when downloading") {
logger.Log.Warn("Encountered possibly intermittent HTTP 502 error.")
retriable = true
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Do we only want to deal with 502s or all 5XX errors? Since we're here, it might be worth handling all server-side issues.
  2. What do you think about dropping the for loop in favour of a regular expression?
Suggested change
for _, line := range strings.Split(stderr, "\n") {
if strings.Contains(line, "Error: 502 when downloading") {
logger.Log.Warn("Encountered possibly intermittent HTTP 502 error.")
retriable = true
}
}
serverErrorMatch := serverErrorsRegex.FindStringSubmatch(stderr)
if len(serverErrorMatch) > errorCodeIndex {
logger.Log.Warn("Encountered possibly intermittent HTTP %d error.", serverErrorMatch[errorCodeIndex])
retriable = true
}

The regex and the index could be defined as global, private variables (making it global forces a one-time compilation of the regex during program's start-up):

// In global context:
var (
(...)
serverErrorsRegex = regexp.MustCompile(`(?m)Error: (5\d{2}) when downloading`)
errorCodeIndex = 1
...

Comment on lines +605 to +607
//
// *NOTE* These values are copied from downloader/downloader.go; they need not be the same but seemed like a
// good enough starting point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment here just for the sake of the draft? It looks like something that we may not want in the final version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually appreciate suggestions of how best to share these -- or perspectives on whether they should be? (Should we create a wrapper with these defaults?)

Perhaps you or @dmcilvaney have thoughts on that?

downloadErr, retriable := tdnfDownload(finalArgs...)
if downloadErr != nil {
if retriable {
logger.Log.Warnf("Attempt %d/%d: Failed to clone packages", retryNum, downloadRetryAttempts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change that to a debug entry. These messages tend to generate noise, confuse people who don't know why they are generated (which is most users) and are red herrings when we debug issues.

Suggested change
logger.Log.Warnf("Attempt %d/%d: Failed to clone packages", retryNum, downloadRetryAttempts)
logger.Log.Debugf("Attempt %d/%d: Failed to clone packages", retryNum, downloadRetryAttempts)

Comment on lines +682 to +686
if strings.HasPrefix(trimmedLine, toyboxConflictsPrefix) {
logger.Log.Warn("Ignoring known toybox conflict")
err = nil
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've seen a question about toybox and if we still need this check. It looks like toybox is not present in 2.0, so I think we can remove this logic completely.

@PawelWMS PawelWMS marked this pull request as ready for review January 23, 2024 21:09
@PawelWMS PawelWMS requested a review from a team as a code owner January 23, 2024 21:09
PawelWMS added a commit that referenced this pull request Jan 24, 2024
@PawelWMS
Copy link
Contributor

Moving the PR to #7445.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
main PR Destined for main
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants